————————————————————————————————————————–

Part 1: Data Exploration

Data Summary

The data set we are exploring focuses on the binomial success or failure of whether the crime rate is above (1) or below (0) the median for a particular neighborhood. There are 466 rows of data, each representing neighborhoods located in the Boston metropolitan area. For each neighborhood we are provided with 13 attributes that could potentially be used as predictor variables and one response variable (“target”) that indicates whether or not the neighborhood does, in fact, have an above-median crime rate. Our assignment is to try to predict whether or not a neighborhood would be more likely to a have an above-mian crime rate based on the 13 attributes provided in the data set. A summary table for the data set is provided below.

.

.

.

.

.

.

.

.

.

.

.

.

. ————————————————————————————————————————–

Part 2: Data Cleaning

Ordinal Imputations

LotFrontage(4) 259, GarageYrBlt(60) 81, MasVnrArea(27) 8

Factor Imputations

Electrical(43) 1459, MasVnrType(26) 1452, BsmtQual(31) 1423, BsmtCond(32) 1423, BsmtFinType1(34) 1423, BsmtExposure(33) 1422 BsmtFinType2(36) 1422, GarageType(59) 1379, GarageFinish(61) 1379, GarageQual(64) 1379, GarageCond(65) 1379, FireplaceQu(58) 770 Fence(74) 281, Alley(7) 91, MiscFeature(75) 54, PoolQC(73) 7

Descriptive Statistics

vars id n mean sd median min max range skew kurtosis se
MSSubClass 2 1460 57 42 50 20 190 170 1 2 1
LotFrontage 4 1201 70 24 69 21 313 292 2 17 1
LotArea 5 1460 10517 9981 9478 1300 215245 213945 12 202 261
OverallQual 18 1460 6 1 6 1 10 9 0 0 0
OverallCond 19 1460 6 1 5 1 9 8 1 1 0
YearBuilt 20 1460 1971 30 1973 1872 2010 138 -1 0 1
YearRemodAdd 21 1460 1985 21 1994 1950 2010 60 -1 -1 1
MasVnrArea 27 1452 104 181 0 0 1600 1600 3 10 5
BsmtFinSF1 35 1460 444 456 384 0 5644 5644 2 11 12
BsmtFinSF2 37 1460 47 161 0 0 1474 1474 4 20 4
BsmtUnfSF 38 1460 567 442 478 0 2336 2336 1 0 12
TotalBsmtSF 39 1460 1057 439 992 0 6110 6110 2 13 11
X1stFlrSF 44 1460 1163 387 1087 334 4692 4358 1 6 10
X2ndFlrSF 45 1460 347 437 0 0 2065 2065 1 -1 11
-LowQualFinSF 46 1460 6 49 0 0 572 572 9 83 1
GrLivArea 47 1460 1515 525 1464 334 5642 5308 1 5 14
BsmtFullBath 48 1460 0 1 0 0 3 3 1 -1 0
BsmtHalfBath 49 1460 0 0 0 0 2 2 4 16 0
FullBath 50 1460 2 1 2 0 3 3 0 -1 0
HalfBath 51 1460 0 1 0 0 2 2 1 -1 0
BedroomAbvGr 52 1460 3 1 3 0 8 8 0 2 0
KitchenAbvGr 53 1460 1 0 1 0 3 3 4 21 0
TotRmsAbvGrd 55 1460 7 2 6 2 14 12 1 1 0
Fireplaces 57 1460 1 1 1 0 3 3 1 0 0
GarageYrBlt 60 1379 1979 25 1980 1900 2010 110 -1 0 1
-GarageCars 62 1460 2 1 2 0 4 4 0 0 0
GarageArea 63 1460 473 214 480 0 1418 1418 0 1 6
WoodDeckSF 67 1460 94 125 0 0 857 857 2 3 3
OpenPorchSF 68 1460 47 66 25 0 547 547 2 8 2
EnclosedPorch 69 1460 22 61 0 0 552 552 3 10 2
X3SsnPorch 70 1460 3 29 0 0 508 508 10 123 1
ScreenPorch 71 1460 15 56 0 0 480 480 4 18 1
PoolArea 72 1460 3 40 0 0 738 738 15 222 1
MiscVal 76 1460 43 496 0 0 15500 15500 24 698 13
MoSold 77 1460 6 3 6 1 12 11 0 0 0
YrSold 78 1460 2008 1 2008 2006 2010 4 0 -1 0
SalePrice 81 1460 180921 79443 163000 4900 755000 720100 2 6 2079

Barplots & Catagorical Data

Histograms for Ordinal Data

Conclusion of Data Exploration

During our data exploration efforts we identified variable(s) that can justifiably be ignored during model building, and we ranked correlation amongst various predictor variables that could be used to order inclusion one or more other variables during the model building process. Furthermore, we identified recurring variable values, require further investigation to determine whether or not additional action might be required to make use of those rows in our analysis. Finally, we identified three variables (. . . ) that are good candidates for transformation during the Data Preparation process.

————————————————————————————————————————–

Part 2 - Data Preparation

Our Data Preparation efforts included an investigation of what appeared to have been an abnormally large number of records within the data set having shared values, proposing the conversion of two predictor variables to binary “0/1” factor variables, and simplifying the interpretation of the ‘black’ variable via mathematical transformation. While we also considered the possibility of transforming one or more of the predictor variables that have skewed distributions, we chose not to apply any such transforms prior to model building since normal distributions aren’t necessarily required for logistical regression modeling. Transforms can be applied if the marginal model plots for a logistic regression model show evidence of deviance between the modeled data and the actual data, but aren’t required prior to model building.

————————————————————————————————————————–

Part 3 - Build Models

Model 1: Use the bestglm Function to Build a Model

# Final Model:
fit <- lm(SalePrice ~ MSSubClass + MSZoning + LotArea + Street + LandContour + 
    Utilities + LotConfig + LandSlope + Neighborhood + Condition1 + 
    Condition2 + BldgType + OverallQual + OverallCond + YearBuilt + 
    YearRemodAdd + RoofStyle + RoofMatl + Exterior1st + MasVnrType + 
    MasVnrArea + ExterQual + BsmtQual + BsmtCond + BsmtExposure + 
    BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + 
    BsmtFullBath + FullBath + BedroomAbvGr + KitchenAbvGr + KitchenQual + 
    TotRmsAbvGrd + Functional + Fireplaces + GarageCars + GarageArea + 
    GarageQual + GarageCond + WoodDeckSF + ScreenPorch + PoolArea + 
    PoolQC + Fence + MoSold + SaleCondition, data = AmesHomes
)

summary(fit)
## 
## Call:
## lm(formula = SalePrice ~ MSSubClass + MSZoning + LotArea + Street + 
##     LandContour + Utilities + LotConfig + LandSlope + Neighborhood + 
##     Condition1 + Condition2 + BldgType + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + RoofStyle + RoofMatl + Exterior1st + 
##     MasVnrType + MasVnrArea + ExterQual + BsmtQual + BsmtCond + 
##     BsmtExposure + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
##     X2ndFlrSF + BsmtFullBath + FullBath + BedroomAbvGr + KitchenAbvGr + 
##     KitchenQual + TotRmsAbvGrd + Functional + Fireplaces + GarageCars + 
##     GarageArea + GarageQual + GarageCond + WoodDeckSF + ScreenPorch + 
##     PoolArea + PoolQC + Fence + MoSold + SaleCondition, data = AmesHomes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -178640   -9148       0    9867  178640 
## 
## Coefficients: (2 not defined because of singularities)
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -1.748e+06  1.679e+05 -10.414  < 2e-16 ***
## MSSubClass           -9.636e+01  4.576e+01  -2.106 0.035411 *  
## MSZoningFV            3.238e+04  1.121e+04   2.889 0.003924 ** 
## MSZoningRH            1.948e+04  1.117e+04   1.744 0.081424 .  
## MSZoningRL            2.377e+04  9.513e+03   2.499 0.012577 *  
## MSZoningRM            1.959e+04  8.869e+03   2.208 0.027386 *  
## LotArea               6.962e-01  9.487e-02   7.338 3.80e-13 ***
## StreetPave            3.361e+04  1.122e+04   2.995 0.002795 ** 
## LandContourHLS        9.211e+03  4.836e+03   1.905 0.057036 .  
## LandContourLow       -8.791e+03  5.928e+03  -1.483 0.138344    
## LandContourLvl        6.372e+03  3.449e+03   1.847 0.064945 .  
## UtilitiesNoSeWa      -3.255e+04  2.384e+04  -1.365 0.172392    
## LotConfigCulDSac      7.192e+03  2.986e+03   2.408 0.016165 *  
## LotConfigFR2         -6.352e+03  3.805e+03  -1.669 0.095280 .  
## LotConfigFR3         -1.326e+04  1.224e+04  -1.083 0.279024    
## LotConfigInside      -1.294e+03  1.657e+03  -0.781 0.434873    
## LandSlopeMod          5.623e+03  3.704e+03   1.518 0.129277    
## LandSlopeSev         -3.928e+04  1.034e+04  -3.797 0.000153 ***
## NeighborhoodBlueste   2.440e+03  1.826e+04   0.134 0.893734    
## NeighborhoodBrDale   -1.694e+03  1.030e+04  -0.164 0.869459    
## NeighborhoodBrkSide  -2.071e+03  8.794e+03  -0.235 0.813905    
## NeighborhoodClearCr  -1.158e+04  8.736e+03  -1.325 0.185417    
## NeighborhoodCollgCr  -8.600e+03  6.876e+03  -1.251 0.211205    
## NeighborhoodCrawfor   1.189e+04  8.091e+03   1.469 0.142070    
## NeighborhoodEdwards  -1.811e+04  7.587e+03  -2.387 0.017115 *  
## NeighborhoodGilbert  -1.182e+04  7.264e+03  -1.627 0.103985    
## NeighborhoodIDOTRR   -6.420e+03  1.001e+04  -0.641 0.521357    
## NeighborhoodMeadowV  -8.398e+03  1.051e+04  -0.799 0.424203    
## NeighborhoodMitchel  -2.165e+04  7.745e+03  -2.795 0.005258 ** 
## NeighborhoodNAmes    -1.614e+04  7.408e+03  -2.178 0.029555 *  
## NeighborhoodNoRidge   2.874e+04  7.994e+03   3.595 0.000337 ***
## NeighborhoodNPkVill   6.081e+03  1.043e+04   0.583 0.559994    
## NeighborhoodNridgHt   1.642e+04  7.062e+03   2.325 0.020233 *  
## NeighborhoodNWAmes   -1.892e+04  7.587e+03  -2.494 0.012762 *  
## NeighborhoodOldTown  -1.149e+04  9.040e+03  -1.271 0.203923    
## NeighborhoodSawyer   -1.080e+04  7.753e+03  -1.393 0.163995    
## NeighborhoodSawyerW  -2.851e+03  7.429e+03  -0.384 0.701207    
## NeighborhoodSomerst  -2.016e+03  8.567e+03  -0.235 0.814039    
## NeighborhoodStoneBr   3.561e+04  7.880e+03   4.520 6.75e-06 ***
## NeighborhoodSWISU    -5.781e+03  9.040e+03  -0.640 0.522592    
## NeighborhoodTimber   -1.241e+04  7.743e+03  -1.603 0.109216    
## NeighborhoodVeenker  -9.747e+01  9.859e+03  -0.010 0.992113    
## Condition1Feedr       7.073e+03  4.681e+03   1.511 0.131023    
## Condition1Norm        1.508e+04  3.884e+03   3.883 0.000108 ***
## Condition1PosA        7.414e+03  9.465e+03   0.783 0.433594    
## Condition1PosN        1.230e+04  6.964e+03   1.766 0.077692 .  
## Condition1RRAe       -1.210e+04  8.246e+03  -1.468 0.142452    
## Condition1RRAn        1.370e+04  6.476e+03   2.115 0.034626 *  
## Condition1RRNe       -5.063e+01  1.691e+04  -0.003 0.997611    
## Condition1RRNn        1.004e+04  1.198e+04   0.838 0.402226    
## Condition2Feedr      -1.082e+04  2.130e+04  -0.508 0.611621    
## Condition2Norm       -1.207e+04  1.832e+04  -0.659 0.510192    
## Condition2PosA        3.890e+04  2.999e+04   1.297 0.194894    
## Condition2PosN       -2.384e+05  2.584e+04  -9.225  < 2e-16 ***
## Condition2RRAe       -1.151e+05  4.156e+04  -2.770 0.005688 ** 
## Condition2RRAn       -1.572e+04  2.951e+04  -0.533 0.594249    
## Condition2RRNn       -1.077e+04  2.502e+04  -0.431 0.666849    
## BldgType2fmCon        3.730e+03  8.287e+03   0.450 0.652741    
## BldgTypeDuplex       -3.395e+03  6.223e+03  -0.546 0.585444    
## BldgTypeTwnhs        -1.528e+04  6.948e+03  -2.199 0.028019 *  
## BldgTypeTwnhsE       -1.091e+04  5.584e+03  -1.953 0.051002 .  
## OverallQual           6.683e+03  9.375e+02   7.129 1.67e-12 ***
## OverallCond           5.571e+03  7.670e+02   7.264 6.45e-13 ***
## YearBuilt             3.720e+02  6.099e+01   6.099 1.40e-09 ***
## YearRemodAdd          1.098e+02  5.058e+01   2.171 0.030088 *  
## RoofStyleGable        6.925e+03  1.740e+04   0.398 0.690721    
## RoofStyleGambrel      1.098e+04  1.889e+04   0.581 0.561127    
## RoofStyleHip          7.291e+03  1.746e+04   0.418 0.676322    
## RoofStyleMansard      2.177e+04  1.992e+04   1.093 0.274588    
## RoofStyleShed         9.154e+04  3.287e+04   2.785 0.005435 ** 
## RoofMatlCompShg       5.785e+05  4.330e+04  13.361  < 2e-16 ***
## RoofMatlMembran       6.664e+05  5.382e+04  12.383  < 2e-16 ***
## RoofMatlMetal         6.395e+05  5.361e+04  11.929  < 2e-16 ***
## RoofMatlRoll          5.701e+05  4.958e+04  11.499  < 2e-16 ***
## RoofMatlTar&Grv       5.772e+05  4.705e+04  12.268  < 2e-16 ***
## RoofMatlWdShake       5.700e+05  4.575e+04  12.458  < 2e-16 ***
## RoofMatlWdShngl       6.321e+05  4.404e+04  14.353  < 2e-16 ***
## Exterior1stAsphShn   -1.119e+04  2.396e+04  -0.467 0.640496    
## Exterior1stBrkComm    6.354e+02  1.854e+04   0.034 0.972668    
## Exterior1stBrkFace    1.665e+04  6.694e+03   2.488 0.012969 *  
## Exterior1stCBlock    -1.202e+04  2.531e+04  -0.475 0.634977    
## Exterior1stCemntBd    2.371e+03  7.066e+03   0.336 0.737297    
## Exterior1stHdBoard   -3.807e+03  6.055e+03  -0.629 0.529604    
## Exterior1stImStucc   -8.253e+03  2.406e+04  -0.343 0.731673    
## Exterior1stMetalSd    1.080e+03  5.933e+03   0.182 0.855546    
## Exterior1stPlywood   -6.775e+03  6.386e+03  -1.061 0.288875    
## Exterior1stStone     -6.108e+03  1.912e+04  -0.320 0.749359    
## Exterior1stStucco     9.139e+02  7.439e+03   0.123 0.902244    
## Exterior1stVinylSd   -2.243e+02  5.970e+03  -0.038 0.970036    
## Exterior1stWd Sdng   -9.220e+02  5.897e+03  -0.156 0.875778    
## Exterior1stWdShing   -3.675e+03  7.366e+03  -0.499 0.617975    
## MasVnrTypeBrkFace     6.694e+03  6.499e+03   1.030 0.303229    
## MasVnrTypeNone        1.009e+04  6.560e+03   1.538 0.124189    
## MasVnrTypeStone       1.084e+04  6.882e+03   1.575 0.115573    
## MasVnrArea            2.194e+01  5.569e+00   3.940 8.59e-05 ***
## ExterQualFa          -7.162e+03  9.706e+03  -0.738 0.460712    
## ExterQualGd          -2.249e+04  4.602e+03  -4.887 1.15e-06 ***
## ExterQualTA          -2.231e+04  5.071e+03  -4.399 1.18e-05 ***
## BsmtQualFa           -1.329e+04  5.904e+03  -2.251 0.024577 *  
## BsmtQualGd           -2.078e+04  3.141e+03  -6.615 5.39e-11 ***
## BsmtQualNone          1.063e+04  2.401e+04   0.443 0.658010    
## BsmtQualTA           -1.798e+04  3.834e+03  -4.691 3.00e-06 ***
## BsmtCondGd            1.074e+03  4.981e+03   0.216 0.829315    
## BsmtCondPo            4.130e+04  2.090e+04   1.976 0.048396 *  
## BsmtCondTA            4.082e+03  3.933e+03   1.038 0.299515    
## BsmtCondXa                   NA         NA      NA       NA    
## BsmtExposureGd        1.578e+04  2.855e+03   5.528 3.90e-08 ***
## BsmtExposureMn       -3.102e+03  2.861e+03  -1.084 0.278481    
## BsmtExposureNo       -6.046e+03  2.026e+03  -2.985 0.002892 ** 
## BsmtExposureXb       -1.563e+04  2.267e+04  -0.690 0.490539    
## BsmtFinSF1            3.457e+01  4.483e+00   7.711 2.47e-14 ***
## BsmtFinSF2            2.479e+01  5.639e+00   4.395 1.20e-05 ***
## BsmtUnfSF             1.842e+01  4.241e+00   4.344 1.51e-05 ***
## X1stFlrSF             4.797e+01  4.932e+00   9.727  < 2e-16 ***
## X2ndFlrSF             5.503e+01  3.465e+00  15.884  < 2e-16 ***
## BsmtFullBath          2.694e+03  1.732e+03   1.556 0.120056    
## FullBath              2.810e+03  1.905e+03   1.476 0.140309    
## BedroomAbvGr         -4.667e+03  1.263e+03  -3.694 0.000230 ***
## KitchenAbvGr         -1.538e+04  5.207e+03  -2.953 0.003200 ** 
## KitchenQualFa        -1.866e+04  5.693e+03  -3.277 0.001077 ** 
## KitchenQualGd        -2.305e+04  3.312e+03  -6.960 5.37e-12 ***
## KitchenQualTA        -2.304e+04  3.732e+03  -6.175 8.84e-10 ***
## TotRmsAbvGrd          2.107e+03  8.918e+02   2.362 0.018304 *  
## FunctionalMaj2       -6.493e+03  1.284e+04  -0.506 0.613186    
## FunctionalMin1        5.659e+03  7.952e+03   0.712 0.476807    
## FunctionalMin2        8.483e+03  7.856e+03   1.080 0.280420    
## FunctionalMod         1.292e+03  9.293e+03   0.139 0.889463    
## FunctionalSev        -3.609e+04  2.651e+04  -1.361 0.173620    
## FunctionalTyp         1.867e+04  6.848e+03   2.727 0.006484 ** 
## Fireplaces            2.626e+03  1.258e+03   2.087 0.037112 *  
## GarageCars            4.671e+03  2.148e+03   2.175 0.029807 *  
## GarageArea            1.414e+01  7.148e+00   1.978 0.048179 *  
## GarageQualFa         -1.233e+05  2.677e+04  -4.605 4.53e-06 ***
## GarageQualGd         -1.171e+05  2.752e+04  -4.254 2.25e-05 ***
## GarageQualPo         -1.374e+05  3.318e+04  -4.140 3.70e-05 ***
## GarageQualTA         -1.196e+05  2.655e+04  -4.506 7.20e-06 ***
## GarageQualXg         -1.372e+03  1.678e+04  -0.082 0.934851    
## GarageCondFa          1.078e+05  3.150e+04   3.421 0.000642 ***
## GarageCondGd          1.020e+05  3.251e+04   3.138 0.001737 ** 
## GarageCondPo          1.109e+05  3.381e+04   3.279 0.001070 ** 
## GarageCondTA          1.103e+05  3.119e+04   3.535 0.000421 ***
## GarageCondXh                 NA         NA      NA       NA    
## WoodDeckSF            1.071e+01  5.545e+00   1.932 0.053597 .  
## ScreenPorch           3.077e+01  1.188e+01   2.591 0.009683 ** 
## PoolArea              5.611e+02  1.646e+02   3.408 0.000673 ***
## PoolQCFa             -1.451e+05  2.558e+04  -5.672 1.74e-08 ***
## PoolQCGd             -1.161e+05  3.074e+04  -3.776 0.000166 ***
## PoolQCNone            1.816e+05  8.953e+04   2.029 0.042684 *  
## FenceGdWo             7.657e+03  4.631e+03   1.653 0.098507 .  
## FenceMnPrv            9.441e+03  3.779e+03   2.498 0.012602 *  
## FenceMnWw             1.081e+03  7.800e+03   0.139 0.889780    
## FenceNone             8.136e+03  3.465e+03   2.348 0.019007 *  
## MoSold               -3.489e+02  2.318e+02  -1.505 0.132534    
## SaleConditionAdjLand  9.769e+03  1.271e+04   0.768 0.442358    
## SaleConditionAlloca  -1.841e+03  8.139e+03  -0.226 0.821075    
## SaleConditionFamily   1.103e+03  5.763e+03   0.191 0.848277    
## SaleConditionNormal   5.739e+03  2.593e+03   2.213 0.027040 *  
## SaleConditionPartial  1.910e+04  3.638e+03   5.250 1.77e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22440 on 1304 degrees of freedom
## Multiple R-squared:  0.9287, Adjusted R-squared:  0.9202 
## F-statistic: 109.6 on 155 and 1304 DF,  p-value: < 2.2e-16

————————————————————————————————————————–

Part 4 - Generate Test Predictions from Model

Barplots & Catagorical Data

## $title
## [1] "Zoning Proportions"
## 
## attr(,"class")
## [1] "labels"
## NULL
## $title
## [1] "Exterior1st Proportions"
## 
## attr(,"class")
## [1] "labels"

## Warning: Removed 1 rows containing non-finite values (stat_count).